My project aims to summarize and visualize the trends in the use of animals in biomedical research by species, pain levels and state in the US the over the period from 2008 to 2019. The data is obtained from the US Department of Agriculture’s APHIS annual reports. Following this exploratory analysis, a prediction for the number of animals that will be used over the next 5 years will be made.
The use of animals in biomedical research is ubiquitous. There are two main categories for animal use: basic biomedical sciences research and (mimicking a human disease in an animal model and studying gene expression, molecular mechanisms, etc.) and drug testing (toxicity and efficacy of an experimental drug). A growing body of studies and voices from the scientific community have pointed out the poor reliability and predictive value of animal models for human outcomes and for understanding human physiology. In 2004, the FDA estimated that 92% of drugs that pass preclinical tests in animal models fail in human clinical trials. More recent analysis suggests that, in spite of efforts to improve the predictability of animal testing, the failure rate has increased to 96%.
In 1938, Congress passed the U.S. Federal Food, Drug, and Cosmetic Act, mandating animal toxicity testing. As of October 7, 2021, Congress introduced the bipartisan FDA Modernization Act to end animal testing mandates. This comes after the European Parliament resoundingly passed a resolution on September 16, 2021, with a vote of 667 to 4 to phase out animal testing. How has animal use in research changed over the past few years in the US? What is the significance of species, state, pain levels? Using the limited USDA data, I aim to answer these questions. Furthermore, a regression model will predict the number of animals that will be needed in the next few years if the same trends continue. This analysis could be of interest to biomedical companies to reduce their time and resources spent on animal models, FDA for revision of regulatory requirements, bioethics specialists and animal advocacy groups.
The trends in animal use could be influenced by technologies that enhanced the ease of building animal models, technologies that performed better than animal models, change in bioethical standards, change in public sentiment about the topic and even opinions voiced by the heads of government regulatory bodies. I am personally interested in this analysis because one of my career goals is to work towards developing and commercializing alternatives to animal methods in biomedical research.
USDA publishes an annual report of the number of animals used by state, species and pain category. The report for every year is published in January two years later. Hence the latest data I could obtain was from 2019. The reports were published starting 2008, hence I have data for a 12 year period.
The data is in the form of several PDFs. I cleaned the data manually into .txt format. The nomenclature is year_col_X where X is the pain category. The pain categories are: - Column B: Animals held by a facility but not used in any research that year - Column C: Animals used in research; no pain involved; no pain drugs administered - Column D: Animals used in research; pain involved; pain drugs administered - Column E: Animals used in research; pain involved; no pain drugs administered.
The following code chunk loads the data as a dataframe, stacked by year and column. The loaded data has 53 rows per year per column. This includes the 50 states, District of Columbia, Puerto Rico, and “REPORT TOTAL” which is the sum of all columns (species). Note that the species column is not an exhaustive list of all animals used in biomedical research but only those covered by the Animal Welfare Act. The AWA does not cover rats, mice, birds and reptiles, which happen to be over 95% of all animals used. Columns B, C, D and E are assigned values 0, 1, 2 and 3 to reflect pain levels in a more intuitive way.
library("tidyverse")
library("dplyr")
datafr <- purrr::map_dfr(
.x = c(2008:2019),
.f = function(x){
purrr::map_dfr(
.x = c("B", "C", "D", "E"),
.f = function(x, y){
dat <- read.table(file = paste0("C:/Users/Deeksha Hegde/Downloads/BMIN503_Final_Project/USDA_Data/",y,"_col_",x,".txt"), header = TRUE, sep = "\t")
dat %>%
mutate(across(.cols = All_Other_Covered_Species:Total, .fns =~ as.numeric(str_remove_all(string = .x, ","))),
Year = y,
Column = x
)
}, y=x
)
}
)
#Checking if the data loaded is correct
datafr %>% count(Year, Column)
## Year Column n
## 1 2008 B 53
## 2 2008 C 53
## 3 2008 D 53
## 4 2008 E 53
## 5 2009 B 53
## 6 2009 C 53
## 7 2009 D 53
## 8 2009 E 53
## 9 2010 B 53
## 10 2010 C 53
## 11 2010 D 53
## 12 2010 E 53
## 13 2011 B 53
## 14 2011 C 53
## 15 2011 D 53
## 16 2011 E 53
## 17 2012 B 53
## 18 2012 C 53
## 19 2012 D 53
## 20 2012 E 53
## 21 2013 B 53
## 22 2013 C 53
## 23 2013 D 53
## 24 2013 E 53
## 25 2014 B 53
## 26 2014 C 53
## 27 2014 D 53
## 28 2014 E 53
## 29 2015 B 53
## 30 2015 C 53
## 31 2015 D 53
## 32 2015 E 53
## 33 2016 B 53
## 34 2016 C 53
## 35 2016 D 53
## 36 2016 E 53
## 37 2017 B 53
## 38 2017 C 53
## 39 2017 D 53
## 40 2017 E 53
## 41 2018 B 53
## 42 2018 C 53
## 43 2018 D 53
## 44 2018 E 53
## 45 2019 B 53
## 46 2019 C 53
## 47 2019 D 53
## 48 2019 E 53
datafr <- mutate(datafr, pain.level = factor(Column, levels = c("B", "C", "D", "E"), labels = c(0, 1, 2, 3)))
Now let us visualize the numbers! The first plot shows the total animals (all categories combined) by year. The second plot shows the same but with the breakup by column.
data.total <- filter(datafr, State == "REPORT TOTAL")
#Total by year
data.total1 <- summarise(group_by(data.total, Year), Total = sum(Total))
ggplot(data = data.total1, aes(x = Year, y = Total)) + geom_line() + scale_x_continuous(breaks = c(2008:2019)) + scale_y_continuous(breaks = seq(900000, 1300000, by = 100000), labels = seq(9, 13, 1)) + xlab(label = "Year") + ylab(label = "Total in hundred thousands") + ggtitle("Total number of animals vs. Year") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
#Total by year and column
total.plot <- ggplot(data = data.total, aes(x = Year, y = Total, color = pain.level)) + geom_line() + scale_x_continuous(breaks = c(2008:2019)) + scale_y_continuous(breaks = seq(100000, 700000, by = 100000), labels = seq(1, 7, 1)) + xlab(label = "Year") + ylab(label = "Total in hundred thousands") + ggtitle("Total number of animals vs. Year by Pain Levels") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
total.plot
We see that there is a huge rise in the total from 2009 to 2010, and a steady decline since then. Overall, there is a significant decline over the 12 year period.
From the second plot, we see that pain category 1, which involves no pain and no pain drugs, has the highest number throughout.
#Total by column by state
data.by.state <- datafr %>%
filter(State != "REPORT TOTAL") %>%
group_by(State, pain.level) %>%
summarise(Total = sum(Total))
data.by.state.total <- group_by(data.by.state, State) %>%
summarize(Total.all = sum(Total))
ggplot(data.by.state.total, aes(x = State, y = Total.all)) + geom_point(size = 4) +
theme(
plot.title = element_text(size = 22, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust = 0.5),
axis.text.y = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
ggplot(data.by.state, aes(x = State, y = Total, color = pain.level)) + geom_point(size = 4) +
theme(
plot.title = element_text(size = 22, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust = 0.5),
axis.text.y = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
From the graphs, we get the following information:
States using the highest number of animals overall across all pain levels: CA, MA, NJ
States using the highest number of animals by pain level: - 0 CA, MD, TX - 1 CA, MA, OH - 2 CA, TX, MA - 3 MO, MI, IA
An arguably better representation of the same data is the use of interactive maps.
library("sf")
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library("spData")
## Warning: package 'spData' was built under R version 4.1.2
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
usa <- st_read("C:/Users/Deeksha Hegde/Downloads/us-state-boundaries/us-state-boundaries.shp") %>%
rename(State = stusab)
## Reading layer `us-state-boundaries' from data source
## `C:\Users\Deeksha Hegde\Downloads\us-state-boundaries\us-state-boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44069
## Geodetic CRS: WGS 84
head(usa)
## Simple feature collection with 6 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.8485 ymin: 14.03656 xmax: 146.1544 ymax: 49.00241
## Geodetic CRS: WGS 84
## gid arealand division intptlat
## 1 16 278176477 0 18.21765
## 2 23 472276664 0 14.93678
## 3 31 1627312771 7 34.89553
## 4 35 2136109036 5 38.64729
## 5 40 -1616974352 1 41.59742
## 6 54 312831514 9 47.40732
## name objectid areawater intptlon
## 1 Puerto Rico 50 628200285 -66.41080
## 2 Commonwealth of the Northern Mariana Islands 36 349301029 145.60102
## 3 Arkansas 44 -1334552525 -92.44463
## 4 West Virginia 1 489848791 -80.61833
## 5 Rhode Island 6 1323457457 -71.52727
## 6 Washington 20 -324557627 -120.57580
## oid funcstat centlon State state statens centlat
## 1 303146031 A -66.41476 PR 72 01779808 18.21647
## 2 -1625647860 A 145.59687 MP 69 01779809 16.79744
## 3 266078934 A -92.44270 AR 05 00068085 34.89402
## 4 -1929409300 A -80.62346 WV 54 01779805 38.64119
## 5 -1861167639 A -71.52472 RI 44 01219835 41.59403
## 6 -1859906639 A -120.59557 WA 53 01779804 47.41490
## basename mtfcc region lsadc geoid
## 1 Puerto Rico G4000 9 00 72
## 2 Commonwealth of the Northern Mariana Islands G4000 9 00 69
## 3 Arkansas G4000 3 00 05
## 4 West Virginia G4000 3 00 54
## 5 Rhode Island G4000 1 00 44
## 6 Washington G4000 4 00 53
## geometry
## 1 MULTIPOLYGON (((-67.20794 1...
## 2 MULTIPOLYGON (((145.5726 15...
## 3 MULTIPOLYGON (((-94.55218 3...
## 4 MULTIPOLYGON (((-81.74725 3...
## 5 MULTIPOLYGON (((-71.7897 41...
## 6 MULTIPOLYGON (((-123.2479 4...
to_map <- inner_join(usa, data.by.state.total, by = "State")
library(mapview)
library(leaflet)
pal_fun <- colorNumeric("BuPu", NULL)
pu_message <- paste0(to_map$State,
"<br>Number of animals used: ",
prettyNum(to_map$Total.all, big.mark = ","))
leaflet(to_map) %>%
addPolygons(stroke = FALSE, # remove polygon borders
fillColor = ~pal_fun(Total.all),
fillOpacity = 0.8, smoothFactor = 0.5,
popup = pu_message) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLegend("bottomright",
pal=pal_fun,
values=~Total.all,
title = 'Total number of animals used',
opacity = 1) %>%
addScaleBar()
library("sf")
library("spData")
usa <- st_read("C:/Users/Deeksha Hegde/Downloads/us-state-boundaries/us-state-boundaries.shp") %>%
rename(State = stusab)
## Reading layer `us-state-boundaries' from data source
## `C:\Users\Deeksha Hegde\Downloads\us-state-boundaries\us-state-boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44069
## Geodetic CRS: WGS 84
head(usa)
## Simple feature collection with 6 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.8485 ymin: 14.03656 xmax: 146.1544 ymax: 49.00241
## Geodetic CRS: WGS 84
## gid arealand division intptlat
## 1 16 278176477 0 18.21765
## 2 23 472276664 0 14.93678
## 3 31 1627312771 7 34.89553
## 4 35 2136109036 5 38.64729
## 5 40 -1616974352 1 41.59742
## 6 54 312831514 9 47.40732
## name objectid areawater intptlon
## 1 Puerto Rico 50 628200285 -66.41080
## 2 Commonwealth of the Northern Mariana Islands 36 349301029 145.60102
## 3 Arkansas 44 -1334552525 -92.44463
## 4 West Virginia 1 489848791 -80.61833
## 5 Rhode Island 6 1323457457 -71.52727
## 6 Washington 20 -324557627 -120.57580
## oid funcstat centlon State state statens centlat
## 1 303146031 A -66.41476 PR 72 01779808 18.21647
## 2 -1625647860 A 145.59687 MP 69 01779809 16.79744
## 3 266078934 A -92.44270 AR 05 00068085 34.89402
## 4 -1929409300 A -80.62346 WV 54 01779805 38.64119
## 5 -1861167639 A -71.52472 RI 44 01219835 41.59403
## 6 -1859906639 A -120.59557 WA 53 01779804 47.41490
## basename mtfcc region lsadc geoid
## 1 Puerto Rico G4000 9 00 72
## 2 Commonwealth of the Northern Mariana Islands G4000 9 00 69
## 3 Arkansas G4000 3 00 05
## 4 West Virginia G4000 3 00 54
## 5 Rhode Island G4000 1 00 44
## 6 Washington G4000 4 00 53
## geometry
## 1 MULTIPOLYGON (((-67.20794 1...
## 2 MULTIPOLYGON (((145.5726 15...
## 3 MULTIPOLYGON (((-94.55218 3...
## 4 MULTIPOLYGON (((-81.74725 3...
## 5 MULTIPOLYGON (((-71.7897 41...
## 6 MULTIPOLYGON (((-123.2479 4...
pl0 <- filter(data.by.state, pain.level == 0)
to_map_0 <- inner_join(usa, pl0, by = "State")
library(mapview)
library(leaflet)
pal_fun <- colorNumeric("BuPu", NULL)
pu_message <- paste0(to_map_0$State, # paste0 to append tract name with other relevant text
"<br>Number of animals used: ", # <br> forces new line
# use round function to round continuous poverty rate to one decimal point
prettyNum(to_map_0$Total, big.mark = ","))
leaflet(to_map_0) %>%
addPolygons(stroke = FALSE, # remove polygon borders
fillColor = ~pal_fun(Total),
fillOpacity = 0.8, smoothFactor = 0.5, # increase opacity and resolution
popup = pu_message) %>%
addProviderTiles(providers$CartoDB.Positron) %>% # add third party provider tile
#addProviderTiles(providers$Stamen.Toner) %>%
#addProviderTiles(providers$Esri.NatGeoWorldMap)
addLegend("bottomright", # location of legend
pal=pal_fun, # palette function
values=~Total, # variable to be passed to palette function
title = 'Total number of animals used at pain level 0', # legend title
opacity = 1) %>% # legend opacity (1 = completely opaque)
addScaleBar()
Repeat for pain levels 1, 2, 3. How do I make all maps appear in one grid? Different colors for pain levels or same? How to make a loop/function to repeat for all pain levels?
Now, since we’re in Pennsylvania, let us see what the plots for PA looks like.
#Total for PA
data.PA <- filter(datafr, State == "PA")
ggplot(data = summarise(group_by(data.PA, Year), Total = sum(Total)), aes(x = Year, y = Total)) + geom_line() + scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year in Pennsylvania")
#Total by column for PA
ggplot(data = data.PA, aes(x = Year, y = Total, color = pain.level)) + geom_line() +
scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year by Pain Levels in Pennsylvania")
We see a huge decline from 2009 to 2015.
Next, I want to find out which states contributed the most to the decline observed in the first plot. I calculated the mean and standard deviation over the years for each state and sorted them in decreasing order of SD. Noticing that Arizona seems to have a higher SD than mean, I calculated the coefficient of variability and sorted based on this.
library(dplyr)
#Std dev of total for all states
data.by.state.year <- datafr %>%
filter(State != "REPORT TOTAL") %>%
group_by(State, Year) %>%
summarise(Total = sum(Total))
state.std.dev <- summarise(data.by.state.year, mean = mean(Total), sd = sd(Total))
state.std.dev[order(state.std.dev$sd, decreasing = TRUE),]
## # A tibble: 52 x 3
## State mean sd
## <chr> <dbl> <dbl>
## 1 AZ 16261. 35984.
## 2 NY 45696. 25676.
## 3 CA 103133. 22675.
## 4 KS 11863. 20574.
## 5 PA 50591. 20247.
## 6 OH 64393. 18341.
## 7 IA 30172 15683.
## 8 NJ 67210. 13946.
## 9 MO 36277. 13905.
## 10 MD 58124. 13290.
## # ... with 42 more rows
state.std.dev <- state.std.dev %>% mutate(cv = sd/mean) %>%
arrange(desc(cv))
state.std.dev
## # A tibble: 52 x 4
## State mean sd cv
## <chr> <dbl> <dbl> <dbl>
## 1 AZ 16261. 35984. 2.21
## 2 KS 11863. 20574. 1.73
## 3 WV 2116. 2936. 1.39
## 4 NE 4406. 3886. 0.882
## 5 AK 636. 468. 0.735
## 6 HI 337 227. 0.675
## 7 DC 7995. 5279. 0.660
## 8 WY 543. 307. 0.565
## 9 NY 45696. 25676. 0.562
## 10 IN 17136. 9132. 0.533
## # ... with 42 more rows
The states having CV > 1 will be examined further.
#Total by column for AZ
data.AZ <- filter(datafr, State == "AZ")
ggplot(data = data.AZ, aes(x = Year, y = Total, color = Column)) + geom_line()+
scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year by Pain Levels in Arizona")
Column C of the year 2010 is clearly an outlier. I verified the data again from the USDA report. It is most likely a typographical error. On examination, I found that the outlier is coming from the All_Other_Covered_Species. I first replaced that value with NA and found the mean along the column for the rest of the years. I replaced the NA with the rounded mean and finally updated the total. The updated plot for AZ is also shown.
#Setting the outlier point to NA
data.AZ[, 2][data.AZ['Column'] == "C" & data.AZ['Year'] == 2010] <- NA
#Setting the outlier point to the mean of the column over the years excluding outlier year
data.AZ[, 2][data.AZ['Column'] == "C" & data.AZ['Year'] == 2010] <- round(mean(data.AZ[, 2][data.AZ['Column'] == "C"], na.rm = TRUE), digits = 0)
#Updating the Total column for the year
data.AZ[, 'Total'][data.AZ['Column'] == "C" & data.AZ['Year'] == 2010] <- sum(data.AZ[2:12][data.AZ['Column'] == "C" & data.AZ['Year'] == 2010])
#Updated plot with outlier adjusted
ggplot(data = data.AZ, aes(x = Year, y = Total, color = Column)) + geom_line()+
scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year by Pain Levels in Arizona (Outlier adjusted)")
The second state having CV > 1 is Kansas. Let’s look at its plot.
#Total by column for KS
data.KS <- filter(datafr, State == "KS")
ggplot(data = data.KS, aes(x = Year, y = Total, color = Column)) + geom_line()+
scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year by Pain Levels in Kansas")
2019 column C is likely to be an outlier but unlike the case with Arizone, we cannot be sure since we don’t have the data for the years after 2019.
Finally, we have West Virginia.
#Total by column for WV
data.WV <- filter(datafr, State == "WV")
ggplot(data = data.WV, aes(x = Year, y = Total, color = Column)) + geom_line()+
scale_x_continuous(breaks = c(2008:2019)) + ggtitle("Total number of animals vs. Year by Pain Levels in West Virginia")
WV does not seem to have outliers, since there is an increase and decrease in all columns over 4 years from 2015-2019.
Now that we have investigated outliers, let us go back to the Total number of animals vs. Year plots and revise them.
datafr[datafr['State'] == "AZ" & datafr['Column'] == "C" & datafr['Year'] == 2010, ] <- data.AZ[data.AZ['Column'] == "C" & data.AZ['Year'] == 2010, ]
datafr[530, 2:13] <- colSums(datafr[478:529, 2:13])
data.total <- filter(datafr, State == "REPORT TOTAL")
#Total by year
data.total1 <- summarise(group_by(data.total, Year), Total = sum(Total))
ggplot(data = data.total1, aes(x = Year, y = Total)) + geom_line() + scale_x_continuous(breaks = c(2008:2019)) + scale_y_continuous(breaks = seq(900000, 1300000, by = 100000), labels = seq(9, 13, 1)) + xlab(label = "Year") + ylab(label = "Total in hundred thousands") + ggtitle("Total number of animals vs. Year") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
#Total by year and column
total.plot <- ggplot(data = data.total, aes(x = Year, y = Total, color = pain.level)) + geom_line() + scale_x_continuous(breaks = c(2008:2019)) + scale_y_continuous(breaks = seq(100000, 700000, by = 100000), labels = seq(1, 7, 1)) + xlab(label = "Year") + ylab(label = "Total in hundred thousands") + ggtitle("Total number of animals vs. Year by Pain Levels") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
total.plot
datafr[datafr['State'] == "REPORT TOTAL" & (datafr['Year'] == 2015 | datafr['Year'] == 2016), ]
## State All_Other_Covered_Species Cats Dogs Guinea_Pigs Hamsters
## 1537 REPORT TOTAL 36221 1149 6080 14033 8309
## 1590 REPORT TOTAL 86618 12586 40071 120519 45863
## 1643 REPORT TOTAL 39652 7288 20668 29927 20000
## 1696 REPORT TOTAL 3796 58 362 22418 32557
## 1749 REPORT TOTAL 44318 1276 5781 19953 4210
## 1802 REPORT TOTAL 158863 12784 41835 126355 50178
## 1855 REPORT TOTAL 51237 6171 18369 30440 17568
## 1908 REPORT TOTAL 3434 20 699 24462 33279
## Marine.Mammals Nonhuman_Primates Other_Farm_Animals Pig Rabbits Sheep
## 1537 NA 43634 4419 5654 15662 1364
## 1590 NA 32962 16037 10194 77838 3284
## 1643 NA 27972 10416 35422 53998 7275
## 1696 NA 1016 1333 861 6512 119
## 1749 NA 38632 NA 5833 16124 1105
## 1802 NA 42650 NA 10553 82997 4288
## 1855 NA 27079 NA 35516 51985 7838
## 1908 NA 1272 NA 2168 5743 93
## Total Year Column pain.level
## 1537 136525 2015 B 0
## 1590 445972 2015 C 1
## 1643 252618 2015 D 2
## 1696 69032 2015 E 3
## 1749 137232 2016 B 0
## 1802 530503 2016 C 1
## 1855 246203 2016 D 2
## 1908 71170 2016 E 3
data.total1 <- mutate(data.total1, percentage.change = (Total/lag(Total) -1)*100)
data.total2 <- mutate(data.total1, percentage.change = 100*(Total - first(Total))/Total) #This gives the same as first plot with different y axis
#This tells us that the overall change in 12 years is a 24% reduction.
ggplot(data = data.total1, aes(x = Year, y = percentage.change)) + geom_line(na.rm = TRUE) + scale_x_continuous(breaks = c(2008:2019))
Now we observe a more steady decline in the total number of animals over the years.
#Nested functions for 50 plots
#This chunk not needed?
state_plot <- purrr::map(
.x = unique(datafr$State)[unique(datafr$State) != "REPORT TOTAL"],
.f = function(x){
ggplot(data = datafr %>% filter(State == x), aes(x = Year, y = Total, color = pain.level)) + geom_line() + ggtitle(label = x) +
scale_x_continuous(breaks = c(2008:2019)) +
theme(
plot.title = element_text(size = 20, face = "bold", hjust = 0.5),
axis.title = element_blank(),
axis.text = element_text(size = 8),
legend.position = "none"
)
}
)
ggplot(data = datafr %>% filter(State != "REPORT TOTAL"), aes(x = Year, y = Total, color = pain.level)) + geom_line(size = 2) + ggtitle("Total number of animals used by each state by pain levels") +
scale_x_continuous(breaks = c(2008:2019)) + facet_wrap( ~ State, ncol = 5, scales = "free") + theme(
strip.text.x = element_text(size = 28, color = "blue", face = "bold"),
plot.title = element_text(size = 50, face = "bold", hjust = 0.5),
axis.title = element_blank(),
axis.text = element_text(size = 8),
legend.position = "bottom",
legend.justification = "right",
legend.title = element_text(size = 50),
legend.text = element_text(size = 48)
)
#Bigger legend
#Line patterns - don't bother about y axis, quantity doesn't matter
Doing the same standard deviation analysis for species. Do bar plots for pain levels.
library("ggplot2")
#Std dev of species for top states
data.CA <- filter(datafr, State == "CA") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
#colMeans(data.CA, na.rm = TRUE)
#full join
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd") %>%
arrange(desc(sd)) %>% full_join(
filter(datafr, State == "CA") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = ~mean(x = .x, na.rm = TRUE))) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "mean"), by = "species")
ggplot(data.CA, aes(x = species, y = sd)) + geom_bar(stat="identity", fill="steelblue", width=0.5) + geom_bar(mapping = aes(y = mean), stat="identity", fill="steelblue", width=0.5, alpha = 0.5, position = "dodge") + ggtitle("Standard deviation of species over 12 years in CA") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
data.CA.1 <- filter(datafr, State == "CA") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year, pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
group_by(pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd")
ggplot(data.CA.1, aes(x = pain.level, y = sd, fill = species)) + geom_bar(stat = "identity", width = 0.5) + ggtitle("Standard deviation of species over 12 years in CA by pain level") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
library("ggplot2")
#Std dev of species for top states
data.MA <- filter(datafr, State == "MA") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd") %>%
arrange(desc(sd))
ggplot(data.MA, aes(x = species, y = sd)) + geom_bar(stat="identity", fill="steelblue", width=0.5) + ggtitle("Standard deviation of species over 12 years in MA") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
data.MA.1 <- filter(datafr, State == "MA") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year, pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
group_by(pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd")
ggplot(data.MA.1, aes(x = pain.level, y = sd, fill = species)) + geom_bar(stat = "identity", position = "dodge", width = 0.5) + ggtitle("Standard deviation of species over 12 years in MA by pain level") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
library("ggplot2")
#Std dev of species for top states
data.NJ <- filter(datafr, State == "NJ") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
#colMeans(data.NJ, na.rm = TRUE)
#mean needed?
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd") %>%
arrange(desc(sd))
ggplot(data.NJ, aes(x = species, y = sd)) + geom_bar(stat="identity", fill="steelblue", width=0.5) + ggtitle("Standard deviation of species over 12 years in NJ") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
data.NJ.1 <- filter(datafr, State == "NJ") %>%
filter(State != "REPORT TOTAL") %>%
select(-State) %>%
group_by(Year, pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sum, na.rm = TRUE)) %>%
ungroup() %>%
group_by(pain.level) %>%
summarise(across(.cols = All_Other_Covered_Species:Total, .fns = sd, na.rm = TRUE)) %>%
pivot_longer(cols = All_Other_Covered_Species:Total, names_to = "species", values_to = "sd")
ggplot(data.NJ.1, aes(x = pain.level, y = sd, fill = species)) + geom_bar(stat = "identity", position = "dodge", width = 0.5) + ggtitle("Standard deviation of species over 12 years in NJ by pain level") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text.x = element_text(size = 8, angle = 90, vjust = 0.5, hjust = 0.5),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
m1 <- lm(Total ~ Year, data = data.total1)
summary(m1)
##
## Call:
## lm(formula = Total ~ Year, data = data.total1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -85230 -18560 -5143 25807 52073
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54799007 6887537 7.956 0.0000124 ***
## Year -26705 3421 -7.807 0.0000146 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 40910 on 10 degrees of freedom
## Multiple R-squared: 0.859, Adjusted R-squared: 0.845
## F-statistic: 60.95 on 1 and 10 DF, p-value: 0.00001458
predict(m1, newdata = data.total1)
## 1 2 3 4 5 6 7 8
## 1176308.3 1149603.7 1122899.2 1096194.7 1069490.1 1042785.6 1016081.1 989376.5
## 9 10 11 12
## 962672.0 935967.5 909262.9 882558.4
data.total.1.predict <- data.frame(year = 2019:2024, predict(m1, newdata = tibble(Year = 2019:2024), interval = "predict"))
ggplot(data = data.total1, aes(x = Year, y = Total)) + geom_line() + geom_smooth(formula = "y~x", method = "lm") + geom_line(data = data.total.1.predict, mapping = aes(x = year, y = fit), linetype = "dashed", color = "blue") + scale_x_continuous(breaks = c(2008:2024)) + scale_y_continuous(breaks = seq(700000, 1300000, by = 100000), labels = seq(7, 13, 1)) + xlab(label = "Year") + ylab(label = "Total in hundred thousands") + ggtitle("Total number of animals vs. Year") +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10)
)
m.quad <- lm(Total ~ Year+Year.sq, data = mutate(data.total1, Year.sq = Year*Year))
summary(m.quad)
##
## Call:
## lm(formula = Total ~ Year + Year.sq, data = mutate(data.total1,
## Year.sq = Year * Year))
##
## Residuals:
## Min 1Q Median 3Q Max
## -80005 -24414 -3277 28059 54767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2245798614.7 4728853108.7 0.475 0.646
## Year -2203020.4 4697157.1 -0.469 0.650
## Year.sq 540.4 1166.4 0.463 0.654
##
## Residual standard error: 42610 on 9 degrees of freedom
## Multiple R-squared: 0.8623, Adjusted R-squared: 0.8317
## F-statistic: 28.19 on 2 and 9 DF, p-value: 0.0001333
#make another column to scale year 0-12?